Experiments in identifying frozen sentences

نویسندگان

  • Graça Fernandes
  • Jorge Baptista
  • JORGE BAPTISTA
چکیده

This paper describes an experiment on the identification of frozen sentences (or verbal idioms) from European Portuguese on large corpus of journalistic text. It aims at identifying the main difficulties (or shortcomings) resulting from the intersection of linguistic information encoded in the lexicongrammar with finite-state transducers that are then applied to texts. The paper shows that, for a selection of frozen sentences, this method is capable of identifying most instances of those idioms in the corpus. In most cases, only insertions of free elements (especially, adverbs), which are external to the frozen construction, hinder better results than those here obtained, namely of precision P=0.94 and recall R=0.88.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

متن کامل

Identifying Usage Expression Sentences in Consumer Product Reviews

In this paper we introduce the problem of identifying usage expression sentences in a consumer product review. We create a human-annotated gold standard dataset of 565 reviews spanning five distinct product categories. Our dataset consists of more than 3, 000 annotated sentences. We further introduce a classification system to label sentences according to whether or not they describe some “usag...

متن کامل

Variables and Values in Children’s Early Word-combinations

A model of syntactic development proposes that children’s very first word-combinations are already generated via productive rules that express in syntactic form the relation between a predicate word and its semantic argument. An alternative hypothesis is that they learn frozen chunks. In Study 1 we analyzed a large sample of young children’s early two-word sentences comprising of verbs with dir...

متن کامل

Identifying Non-Explicit Citing Sentences for Citation-Based Summarization

Identifying background (context) information in scientific articles can help scholars understand major contributions in their research area more easily. In this paper, we propose a general framework based on probabilistic inference to extract such context information from scientific papers. We model the sentences in an article and their lexical similarities as a Markov Random Field tuned to det...

متن کامل

BUCC 2017 Shared Task: a First Attempt Toward a Deep Learning Framework for Identifying Parallel Sentences in Comparable Corpora

This paper describes our participation in BUCC 2017 shared task: identifying parallel sentences in comparable corpora. Our goal is to leverage continuous vector representations and distributional semantics with a minimal use of external preprocessing and postprocessing tools. We report experiments that were conducted after transmitting our results.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007